This is nearly entirely based on the code in notebook 09 and that in 11.
We have latent variable expression analysis data - Latent Variable Table - Latent Variables selected by Random Forest
For this data we are also using any data for which there are gene variants (cNFs, pNFs, MPNSTs): - Exome-Seq variants - WGS Variants
Lastly we need to filter by genes that are expressed to avoid getting too many un-qualifying variants. - RNA-Seq Data
Let’s see if there are any LVs that split based on gene variant. Because we’re having trouble scaling with the number of latent variables, I only look at variants that occur in less than 5% of the population. notice this is a difference from notebook #11.
wgs.vars=synTableQuery("SELECT Hugo_Symbol,Protein_position,specimenID,IMPACT,FILTER,ExAC_AF,gnomAD_AF FROM syn20551862")$asDataFrame()
##
Building the CSV... [###-----------------]16.00% 227123/1419406
Building the CSV... [#####---------------]24.59% 349035/1419406
Building the CSV... [########------------]41.79% 593200/1419406
Create CSV FileHandle [##########----------]50.00% 709707/1419406
Create CSV FileHandle [####################]100.00% 1419406/1419406 Done...
Downloading [#-------------------]3.11% 2.0MB/64.3MB (673.0kB/s) Job-10385265531005259675571778.csv
Downloading [#-------------------]6.22% 4.0MB/64.3MB (720.2kB/s) Job-10385265531005259675571778.csv
Downloading [##------------------]9.33% 6.0MB/64.3MB (755.7kB/s) Job-10385265531005259675571778.csv
Downloading [##------------------]12.44% 8.0MB/64.3MB (825.8kB/s) Job-10385265531005259675571778.csv
Downloading [###-----------------]15.55% 10.0MB/64.3MB (920.7kB/s) Job-10385265531005259675571778.csv
Downloading [####----------------]18.66% 12.0MB/64.3MB (1011.1kB/s) Job-10385265531005259675571778.csv
Downloading [####----------------]21.77% 14.0MB/64.3MB (1.1MB/s) Job-10385265531005259675571778.csv
Downloading [#####---------------]24.88% 16.0MB/64.3MB (1.2MB/s) Job-10385265531005259675571778.csv
Downloading [######--------------]27.99% 18.0MB/64.3MB (1.3MB/s) Job-10385265531005259675571778.csv
Downloading [######--------------]31.10% 20.0MB/64.3MB (1.4MB/s) Job-10385265531005259675571778.csv
Downloading [#######-------------]34.21% 22.0MB/64.3MB (1.5MB/s) Job-10385265531005259675571778.csv
Downloading [#######-------------]37.32% 24.0MB/64.3MB (1.6MB/s) Job-10385265531005259675571778.csv
Downloading [########------------]40.43% 26.0MB/64.3MB (1.7MB/s) Job-10385265531005259675571778.csv
Downloading [#########-----------]43.54% 28.0MB/64.3MB (1.8MB/s) Job-10385265531005259675571778.csv
Downloading [#########-----------]46.65% 30.0MB/64.3MB (1.9MB/s) Job-10385265531005259675571778.csv
Downloading [##########----------]49.76% 32.0MB/64.3MB (2.0MB/s) Job-10385265531005259675571778.csv
Downloading [###########---------]52.88% 34.0MB/64.3MB (2.0MB/s) Job-10385265531005259675571778.csv
Downloading [###########---------]55.99% 36.0MB/64.3MB (2.1MB/s) Job-10385265531005259675571778.csv
Downloading [############--------]59.10% 38.0MB/64.3MB (2.2MB/s) Job-10385265531005259675571778.csv
Downloading [############--------]62.21% 40.0MB/64.3MB (2.3MB/s) Job-10385265531005259675571778.csv
Downloading [#############-------]65.32% 42.0MB/64.3MB (2.4MB/s) Job-10385265531005259675571778.csv
Downloading [##############------]68.43% 44.0MB/64.3MB (2.4MB/s) Job-10385265531005259675571778.csv
Downloading [##############------]71.54% 46.0MB/64.3MB (2.5MB/s) Job-10385265531005259675571778.csv
Downloading [###############-----]74.65% 48.0MB/64.3MB (2.6MB/s) Job-10385265531005259675571778.csv
Downloading [################----]77.76% 50.0MB/64.3MB (2.6MB/s) Job-10385265531005259675571778.csv
Downloading [################----]80.87% 52.0MB/64.3MB (2.7MB/s) Job-10385265531005259675571778.csv
Downloading [#################---]83.98% 54.0MB/64.3MB (2.7MB/s) Job-10385265531005259675571778.csv
Downloading [#################---]87.09% 56.0MB/64.3MB (2.8MB/s) Job-10385265531005259675571778.csv
Downloading [##################--]90.20% 58.0MB/64.3MB (2.9MB/s) Job-10385265531005259675571778.csv
Downloading [###################-]93.31% 60.0MB/64.3MB (2.9MB/s) Job-10385265531005259675571778.csv
Downloading [###################-]96.42% 62.0MB/64.3MB (3.0MB/s) Job-10385265531005259675571778.csv
Downloading [####################]99.53% 64.0MB/64.3MB (3.1MB/s) Job-10385265531005259675571778.csv
Downloading [####################]100.00% 64.3MB/64.3MB (3.1MB/s) Job-10385265531005259675571778.csv Done...
exome.vars=synTableQuery("SELECT Hugo_Symbol,Protein_position,specimenID,IMPACT,FILTER,ExAC_AF,gnomAD_AF FROM syn20554939")$asDataFrame()
##
Building the CSV... [##------------------]12.50% 239535/1916686
Building the CSV... [####----------------]19.34% 370774/1916686
Building the CSV... [######--------------]31.84% 610183/1916686
Building the CSV... [########------------]38.04% 729149/1916686
Create CSV FileHandle [##########----------]50.00% 958348/1916686
Create CSV FileHandle [####################]100.00% 1916686/1916686 Done...
Downloading [--------------------]2.42% 2.0MB/82.5MB (1.7MB/s) Job-103852667844842091230037446.csv
Downloading [#-------------------]4.85% 4.0MB/82.5MB (2.1MB/s) Job-103852667844842091230037446.csv
Downloading [#-------------------]7.27% 6.0MB/82.5MB (2.4MB/s) Job-103852667844842091230037446.csv
Downloading [##------------------]9.69% 8.0MB/82.5MB (2.7MB/s) Job-103852667844842091230037446.csv
Downloading [##------------------]12.11% 10.0MB/82.5MB (3.0MB/s) Job-103852667844842091230037446.csv
Downloading [###-----------------]14.54% 12.0MB/82.5MB (3.2MB/s) Job-103852667844842091230037446.csv
Downloading [###-----------------]16.96% 14.0MB/82.5MB (3.4MB/s) Job-103852667844842091230037446.csv
Downloading [####----------------]19.38% 16.0MB/82.5MB (3.6MB/s) Job-103852667844842091230037446.csv
Downloading [####----------------]21.81% 18.0MB/82.5MB (3.7MB/s) Job-103852667844842091230037446.csv
Downloading [#####---------------]24.23% 20.0MB/82.5MB (3.9MB/s) Job-103852667844842091230037446.csv
Downloading [#####---------------]26.65% 22.0MB/82.5MB (4.0MB/s) Job-103852667844842091230037446.csv
Downloading [######--------------]29.07% 24.0MB/82.5MB (4.0MB/s) Job-103852667844842091230037446.csv
Downloading [######--------------]31.50% 26.0MB/82.5MB (4.1MB/s) Job-103852667844842091230037446.csv
Downloading [#######-------------]33.92% 28.0MB/82.5MB (4.1MB/s) Job-103852667844842091230037446.csv
Downloading [#######-------------]36.34% 30.0MB/82.5MB (4.2MB/s) Job-103852667844842091230037446.csv
Downloading [########------------]38.77% 32.0MB/82.5MB (4.3MB/s) Job-103852667844842091230037446.csv
Downloading [########------------]41.19% 34.0MB/82.5MB (4.3MB/s) Job-103852667844842091230037446.csv
Downloading [#########-----------]43.61% 36.0MB/82.5MB (4.4MB/s) Job-103852667844842091230037446.csv
Downloading [#########-----------]46.03% 38.0MB/82.5MB (4.5MB/s) Job-103852667844842091230037446.csv
Downloading [##########----------]48.46% 40.0MB/82.5MB (4.5MB/s) Job-103852667844842091230037446.csv
Downloading [##########----------]50.88% 42.0MB/82.5MB (4.5MB/s) Job-103852667844842091230037446.csv
Downloading [###########---------]53.30% 44.0MB/82.5MB (4.5MB/s) Job-103852667844842091230037446.csv
Downloading [###########---------]55.73% 46.0MB/82.5MB (4.6MB/s) Job-103852667844842091230037446.csv
Downloading [############--------]58.15% 48.0MB/82.5MB (4.6MB/s) Job-103852667844842091230037446.csv
Downloading [############--------]60.57% 50.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv
Downloading [#############-------]62.99% 52.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv
Downloading [#############-------]65.42% 54.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv
Downloading [##############------]67.84% 56.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv
Downloading [##############------]70.26% 58.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv
Downloading [###############-----]72.69% 60.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv
Downloading [###############-----]75.11% 62.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv
Downloading [################----]77.53% 64.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv
Downloading [################----]79.95% 66.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv
Downloading [################----]82.38% 68.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv
Downloading [#################---]84.80% 70.0MB/82.5MB (4.8MB/s) Job-103852667844842091230037446.csv
Downloading [#################---]87.22% 72.0MB/82.5MB (4.8MB/s) Job-103852667844842091230037446.csv
Downloading [##################--]89.65% 74.0MB/82.5MB (4.8MB/s) Job-103852667844842091230037446.csv
Downloading [##################--]92.07% 76.0MB/82.5MB (4.8MB/s) Job-103852667844842091230037446.csv
Downloading [###################-]94.49% 78.0MB/82.5MB (4.8MB/s) Job-103852667844842091230037446.csv
Downloading [###################-]96.91% 80.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv
Downloading [####################]99.34% 82.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv
Downloading [####################]100.00% 82.5MB/82.5MB (4.8MB/s) Job-103852667844842091230037446.csv Done...
all.vars<-rbind(select(wgs.vars,'Hugo_Symbol','Protein_position','specimenID','IMPACT','gnomAD_AF'),
select(exome.vars,'Hugo_Symbol','Protein_position','specimenID','IMPACT','gnomAD_AF'))%>%
subset(gnomAD_AF<0.01)
tabids<-synTableQuery('select distinct tableId from syn21221980')$asDataFrame()
vars="specimenID,individualID,Symbol,totalCounts,zScore,tumorType,nf1Genotype,sex"
full.tab<-do.call(rbind,lapply(tabids$tableId,function(x) synTableQuery(paste('select',vars,'from',x))$asDataFrame()))
##
Building the CSV... [##------------------]9.28% 100326/1080532
Building the CSV... [######--------------]29.74% 321315/1080532
Building the CSV... [########------------]39.98% 431959/1080532
Create CSV FileHandle [##########----------]50.11% 541468/1080532
Create CSV FileHandle [####################]100.00% 1080532/1080532 Done...
Downloading [#-------------------]3.14% 2.0MB/63.6MB (2.2MB/s) Job-103852671253496140225452931.csv
Downloading [#-------------------]6.29% 4.0MB/63.6MB (2.8MB/s) Job-103852671253496140225452931.csv
Downloading [##------------------]9.43% 6.0MB/63.6MB (3.0MB/s) Job-103852671253496140225452931.csv
Downloading [###-----------------]12.58% 8.0MB/63.6MB (3.1MB/s) Job-103852671253496140225452931.csv
Downloading [###-----------------]15.72% 10.0MB/63.6MB (3.3MB/s) Job-103852671253496140225452931.csv
Downloading [####----------------]18.86% 12.0MB/63.6MB (3.4MB/s) Job-103852671253496140225452931.csv
Downloading [####----------------]22.01% 14.0MB/63.6MB (3.4MB/s) Job-103852671253496140225452931.csv
Downloading [#####---------------]25.15% 16.0MB/63.6MB (3.6MB/s) Job-103852671253496140225452931.csv
Downloading [######--------------]28.30% 18.0MB/63.6MB (3.6MB/s) Job-103852671253496140225452931.csv
Downloading [######--------------]31.44% 20.0MB/63.6MB (3.7MB/s) Job-103852671253496140225452931.csv
Downloading [#######-------------]34.58% 22.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv
Downloading [########------------]37.73% 24.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv
Downloading [########------------]40.87% 26.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv
Downloading [#########-----------]44.02% 28.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv
Downloading [#########-----------]47.16% 30.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv
Downloading [##########----------]50.30% 32.0MB/63.6MB (3.9MB/s) Job-103852671253496140225452931.csv
Downloading [###########---------]53.45% 34.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv
Downloading [###########---------]56.59% 36.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv
Downloading [############--------]59.74% 38.0MB/63.6MB (3.7MB/s) Job-103852671253496140225452931.csv
Downloading [#############-------]62.88% 40.0MB/63.6MB (3.7MB/s) Job-103852671253496140225452931.csv
Downloading [#############-------]66.02% 42.0MB/63.6MB (3.7MB/s) Job-103852671253496140225452931.csv
Downloading [##############------]69.17% 44.0MB/63.6MB (3.7MB/s) Job-103852671253496140225452931.csv
Downloading [##############------]72.31% 46.0MB/63.6MB (3.7MB/s) Job-103852671253496140225452931.csv
Downloading [###############-----]75.46% 48.0MB/63.6MB (3.7MB/s) Job-103852671253496140225452931.csv
Downloading [################----]78.60% 50.0MB/63.6MB (3.7MB/s) Job-103852671253496140225452931.csv
Downloading [################----]81.74% 52.0MB/63.6MB (3.7MB/s) Job-103852671253496140225452931.csv
Downloading [#################---]84.89% 54.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv
Downloading [##################--]88.03% 56.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv
Downloading [##################--]91.18% 58.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv
Downloading [###################-]94.32% 60.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv
Downloading [###################-]97.46% 62.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv
Downloading [####################]100.00% 63.6MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv Done...
Building the CSV... [#####---------------]23.00% 88077/382920
Building the CSV... [####################]100.00% 382920/382920 Done...
Downloading [##------------------]8.72% 2.0MB/22.9MB (2.2MB/s) Job-103852685473184612291667933.csv
Downloading [###-----------------]17.44% 4.0MB/22.9MB (2.6MB/s) Job-103852685473184612291667933.csv
Downloading [#####---------------]26.15% 6.0MB/22.9MB (2.5MB/s) Job-103852685473184612291667933.csv
Downloading [#######-------------]34.87% 8.0MB/22.9MB (2.5MB/s) Job-103852685473184612291667933.csv
Downloading [#########-----------]43.59% 10.0MB/22.9MB (2.5MB/s) Job-103852685473184612291667933.csv
Downloading [##########----------]52.31% 12.0MB/22.9MB (2.5MB/s) Job-103852685473184612291667933.csv
Downloading [############--------]61.02% 14.0MB/22.9MB (2.5MB/s) Job-103852685473184612291667933.csv
Downloading [##############------]69.74% 16.0MB/22.9MB (2.5MB/s) Job-103852685473184612291667933.csv
Downloading [################----]78.46% 18.0MB/22.9MB (2.6MB/s) Job-103852685473184612291667933.csv
Downloading [#################---]87.18% 20.0MB/22.9MB (2.6MB/s) Job-103852685473184612291667933.csv
Downloading [###################-]95.90% 22.0MB/22.9MB (2.6MB/s) Job-103852685473184612291667933.csv
Downloading [####################]100.00% 22.9MB/22.9MB (2.6MB/s) Job-103852685473184612291667933.csv Done...
Building the CSV... [###-----------------]17.31% 218130/1260468
Building the CSV... [#####---------------]25.83% 325589/1260468
Building the CSV... [#########-----------]43.66% 550263/1260468
Create CSV FileHandle [##########----------]50.05% 630920/1260468
Create CSV FileHandle [####################]100.00% 1260468/1260468 Done...
Downloading [#-------------------]2.77% 2.0MB/72.2MB (851.7kB/s) Job-103852695086556061431476080.csv
Downloading [#-------------------]5.54% 4.0MB/72.2MB (905.9kB/s) Job-103852695086556061431476080.csv
Downloading [##------------------]8.31% 6.0MB/72.2MB (639.3kB/s) Job-103852695086556061431476080.csv
Downloading [##------------------]11.08% 8.0MB/72.2MB (657.9kB/s) Job-103852695086556061431476080.csv
Downloading [###-----------------]13.85% 10.0MB/72.2MB (683.7kB/s) Job-103852695086556061431476080.csv
Downloading [###-----------------]16.62% 12.0MB/72.2MB (662.0kB/s) Job-103852695086556061431476080.csv
Downloading [####----------------]19.39% 14.0MB/72.2MB (664.8kB/s) Job-103852695086556061431476080.csv
Downloading [####----------------]22.17% 16.0MB/72.2MB (688.8kB/s) Job-103852695086556061431476080.csv
Downloading [#####---------------]24.94% 18.0MB/72.2MB (718.6kB/s) Job-103852695086556061431476080.csv
Downloading [######--------------]27.71% 20.0MB/72.2MB (761.7kB/s) Job-103852695086556061431476080.csv
Downloading [######--------------]30.48% 22.0MB/72.2MB (807.7kB/s) Job-103852695086556061431476080.csv
Downloading [#######-------------]33.25% 24.0MB/72.2MB (861.0kB/s) Job-103852695086556061431476080.csv
Downloading [#######-------------]36.02% 26.0MB/72.2MB (909.9kB/s) Job-103852695086556061431476080.csv
Downloading [########------------]38.79% 28.0MB/72.2MB (957.1kB/s) Job-103852695086556061431476080.csv
Downloading [########------------]41.56% 30.0MB/72.2MB (1007.1kB/s) Job-103852695086556061431476080.csv
Downloading [#########-----------]44.33% 32.0MB/72.2MB (1.0MB/s) Job-103852695086556061431476080.csv
Downloading [#########-----------]47.10% 34.0MB/72.2MB (1.1MB/s) Job-103852695086556061431476080.csv
Downloading [##########----------]49.87% 36.0MB/72.2MB (1.1MB/s) Job-103852695086556061431476080.csv
Downloading [###########---------]52.64% 38.0MB/72.2MB (1.2MB/s) Job-103852695086556061431476080.csv
Downloading [###########---------]55.41% 40.0MB/72.2MB (1.2MB/s) Job-103852695086556061431476080.csv
Downloading [############--------]58.18% 42.0MB/72.2MB (1.3MB/s) Job-103852695086556061431476080.csv
Downloading [############--------]60.95% 44.0MB/72.2MB (1.3MB/s) Job-103852695086556061431476080.csv
Downloading [#############-------]63.72% 46.0MB/72.2MB (1.4MB/s) Job-103852695086556061431476080.csv
Downloading [#############-------]66.50% 48.0MB/72.2MB (1.4MB/s) Job-103852695086556061431476080.csv
Downloading [##############------]69.27% 50.0MB/72.2MB (1.5MB/s) Job-103852695086556061431476080.csv
Downloading [##############------]72.04% 52.0MB/72.2MB (1.5MB/s) Job-103852695086556061431476080.csv
Downloading [###############-----]74.81% 54.0MB/72.2MB (1.6MB/s) Job-103852695086556061431476080.csv
Downloading [################----]77.58% 56.0MB/72.2MB (1.6MB/s) Job-103852695086556061431476080.csv
Downloading [################----]80.35% 58.0MB/72.2MB (1.7MB/s) Job-103852695086556061431476080.csv
Downloading [#################---]83.12% 60.0MB/72.2MB (1.7MB/s) Job-103852695086556061431476080.csv
Downloading [#################---]85.89% 62.0MB/72.2MB (1.8MB/s) Job-103852695086556061431476080.csv
Downloading [##################--]88.66% 64.0MB/72.2MB (1.8MB/s) Job-103852695086556061431476080.csv
Downloading [##################--]91.43% 66.0MB/72.2MB (1.8MB/s) Job-103852695086556061431476080.csv
Downloading [###################-]94.20% 68.0MB/72.2MB (1.9MB/s) Job-103852695086556061431476080.csv
Downloading [###################-]96.97% 70.0MB/72.2MB (1.9MB/s) Job-103852695086556061431476080.csv
Downloading [####################]99.74% 72.0MB/72.2MB (2.0MB/s) Job-103852695086556061431476080.csv
Downloading [####################]100.00% 72.2MB/72.2MB (2.0MB/s) Job-103852695086556061431476080.csv Done...
Building the CSV... [#-------------------]5.81% 106538/1833408
Building the CSV... [##------------------]12.00% 219980/1833408
Building the CSV... [#####---------------]24.38% 446932/1833408
Building the CSV... [######--------------]30.45% 558333/1833408
Building the CSV... [#########-----------]42.81% 784884/1833408
Building the CSV... [##########----------]49.13% 900715/1833408
Create CSV FileHandle [##########----------]50.08% 918136/1833408
Create CSV FileHandle [####################]100.00% 1833408/1833408 Done...
Downloading [--------------------]2.10% 2.0MB/95.3MB (1.3MB/s) Job-103852708615019317247030598.csv
Downloading [#-------------------]4.20% 4.0MB/95.3MB (1.4MB/s) Job-103852708615019317247030598.csv
Downloading [#-------------------]6.30% 6.0MB/95.3MB (1.5MB/s) Job-103852708615019317247030598.csv
Downloading [##------------------]8.40% 8.0MB/95.3MB (1.5MB/s) Job-103852708615019317247030598.csv
Downloading [##------------------]10.49% 10.0MB/95.3MB (1.5MB/s) Job-103852708615019317247030598.csv
Downloading [###-----------------]12.59% 12.0MB/95.3MB (1.0MB/s) Job-103852708615019317247030598.csv
Downloading [###-----------------]14.69% 14.0MB/95.3MB (885.8kB/s) Job-103852708615019317247030598.csv
Downloading [###-----------------]16.79% 16.0MB/95.3MB (904.4kB/s) Job-103852708615019317247030598.csv
Downloading [####----------------]18.89% 18.0MB/95.3MB (968.0kB/s) Job-103852708615019317247030598.csv
Downloading [####----------------]20.99% 20.0MB/95.3MB (1.0MB/s) Job-103852708615019317247030598.csv
Downloading [#####---------------]23.09% 22.0MB/95.3MB (1.1MB/s) Job-103852708615019317247030598.csv
Downloading [#####---------------]25.19% 24.0MB/95.3MB (1.2MB/s) Job-103852708615019317247030598.csv
Downloading [#####---------------]27.28% 26.0MB/95.3MB (1.2MB/s) Job-103852708615019317247030598.csv
Downloading [######--------------]29.38% 28.0MB/95.3MB (1.3MB/s) Job-103852708615019317247030598.csv
Downloading [######--------------]31.48% 30.0MB/95.3MB (1.4MB/s) Job-103852708615019317247030598.csv
Downloading [#######-------------]33.58% 32.0MB/95.3MB (1.4MB/s) Job-103852708615019317247030598.csv
Downloading [#######-------------]35.68% 34.0MB/95.3MB (1.5MB/s) Job-103852708615019317247030598.csv
Downloading [########------------]37.78% 36.0MB/95.3MB (1.5MB/s) Job-103852708615019317247030598.csv
Downloading [########------------]39.88% 38.0MB/95.3MB (1.6MB/s) Job-103852708615019317247030598.csv
Downloading [########------------]41.98% 40.0MB/95.3MB (1.7MB/s) Job-103852708615019317247030598.csv
Downloading [#########-----------]44.07% 42.0MB/95.3MB (1.7MB/s) Job-103852708615019317247030598.csv
Downloading [#########-----------]46.17% 44.0MB/95.3MB (1.8MB/s) Job-103852708615019317247030598.csv
Downloading [##########----------]48.27% 46.0MB/95.3MB (1.8MB/s) Job-103852708615019317247030598.csv
Downloading [##########----------]50.37% 48.0MB/95.3MB (1.9MB/s) Job-103852708615019317247030598.csv
Downloading [##########----------]52.47% 50.0MB/95.3MB (1.9MB/s) Job-103852708615019317247030598.csv
Downloading [###########---------]54.57% 52.0MB/95.3MB (2.0MB/s) Job-103852708615019317247030598.csv
Downloading [###########---------]56.67% 54.0MB/95.3MB (2.0MB/s) Job-103852708615019317247030598.csv
Downloading [############--------]58.77% 56.0MB/95.3MB (2.1MB/s) Job-103852708615019317247030598.csv
Downloading [############--------]60.86% 58.0MB/95.3MB (2.1MB/s) Job-103852708615019317247030598.csv
Downloading [#############-------]62.96% 60.0MB/95.3MB (2.1MB/s) Job-103852708615019317247030598.csv
Downloading [#############-------]65.06% 62.0MB/95.3MB (2.2MB/s) Job-103852708615019317247030598.csv
Downloading [#############-------]67.16% 64.0MB/95.3MB (2.2MB/s) Job-103852708615019317247030598.csv
Downloading [##############------]69.26% 66.0MB/95.3MB (2.3MB/s) Job-103852708615019317247030598.csv
Downloading [##############------]71.36% 68.0MB/95.3MB (2.3MB/s) Job-103852708615019317247030598.csv
Downloading [###############-----]73.46% 70.0MB/95.3MB (2.3MB/s) Job-103852708615019317247030598.csv
Downloading [###############-----]75.56% 72.0MB/95.3MB (2.4MB/s) Job-103852708615019317247030598.csv
Downloading [################----]77.65% 74.0MB/95.3MB (2.4MB/s) Job-103852708615019317247030598.csv
Downloading [################----]79.75% 76.0MB/95.3MB (2.4MB/s) Job-103852708615019317247030598.csv
Downloading [################----]81.85% 78.0MB/95.3MB (2.5MB/s) Job-103852708615019317247030598.csv
Downloading [#################---]83.95% 80.0MB/95.3MB (2.5MB/s) Job-103852708615019317247030598.csv
Downloading [#################---]86.05% 82.0MB/95.3MB (2.5MB/s) Job-103852708615019317247030598.csv
Downloading [##################--]88.15% 84.0MB/95.3MB (2.6MB/s) Job-103852708615019317247030598.csv
Downloading [##################--]90.25% 86.0MB/95.3MB (2.6MB/s) Job-103852708615019317247030598.csv
Downloading [##################--]92.35% 88.0MB/95.3MB (2.6MB/s) Job-103852708615019317247030598.csv
Downloading [###################-]94.44% 90.0MB/95.3MB (2.7MB/s) Job-103852708615019317247030598.csv
Downloading [###################-]96.54% 92.0MB/95.3MB (2.7MB/s) Job-103852708615019317247030598.csv
Downloading [####################]98.64% 94.0MB/95.3MB (2.7MB/s) Job-103852708615019317247030598.csv
Downloading [####################]100.00% 95.3MB/95.3MB (2.7MB/s) Job-103852708615019317247030598.csv Done...
#lets only get those genes that are expressed in all samples
expr.genes<-full.tab%>%group_by(Symbol)%>%
summarize(minExpr=min(totalCounts))%>%
subset(minExpr>0)%>%ungroup()%>%select(Symbol)%>%
distinct()
top.lvs<-synTableQuery("SELECT * from syn21318452")$asDataFrame()
##
[####################]100.00% 1/1 Done...
Downloading [####################]100.00% 3.7kB/3.7kB (870.9kB/s) Job-103852711257363491473303588.csv Done...
mp_res<-synTableQuery("SELECT * FROM syn21046991")$asDataFrame()%>%
filter(isCellLine != "TRUE")%>%
subset(latent_var%in%top.lvs$LatentVar)%>%
select(latent_var,id,value,specimenID,tumorType,modelOf,diagnosis)
##
Building the CSV... [########------------]38.55% 57858/150072
Building the CSV... [####################]100.00% 150072/150072 Done...
Downloading [##------------------]7.96% 2.0MB/25.1MB (2.9MB/s) Job-103852722592734267688670055.csv
Downloading [###-----------------]15.92% 4.0MB/25.1MB (4.3MB/s) Job-103852722592734267688670055.csv
Downloading [#####---------------]23.88% 6.0MB/25.1MB (5.2MB/s) Job-103852722592734267688670055.csv
Downloading [######--------------]31.85% 8.0MB/25.1MB (5.8MB/s) Job-103852722592734267688670055.csv
Downloading [########------------]39.81% 10.0MB/25.1MB (6.2MB/s) Job-103852722592734267688670055.csv
Downloading [##########----------]47.77% 12.0MB/25.1MB (6.3MB/s) Job-103852722592734267688670055.csv
Downloading [###########---------]55.73% 14.0MB/25.1MB (6.7MB/s) Job-103852722592734267688670055.csv
Downloading [#############-------]63.69% 16.0MB/25.1MB (7.0MB/s) Job-103852722592734267688670055.csv
Downloading [##############------]71.65% 18.0MB/25.1MB (7.1MB/s) Job-103852722592734267688670055.csv
Downloading [################----]79.61% 20.0MB/25.1MB (7.1MB/s) Job-103852722592734267688670055.csv
Downloading [##################--]87.58% 22.0MB/25.1MB (7.3MB/s) Job-103852722592734267688670055.csv
Downloading [###################-]95.54% 24.0MB/25.1MB (7.5MB/s) Job-103852722592734267688670055.csv
Downloading [####################]100.00% 25.1MB/25.1MB (7.6MB/s) Job-103852722592734267688670055.csv Done...
For the purposes of this analysis we want to have only those samples wtih genomic data and only those latent variables that are selected by the Random Forest as predictive, and also those variants that are expressed.
expr.vars<-subset(all.vars,Hugo_Symbol%in%expr.genes$Symbol)
samps<-intersect(mp_res$specimenID,expr.vars$specimenID)
mp_res<-mp_res%>%
subset(specimenID%in%samps)#%>%
# group_by(latent_var) %>%
# mutate(sd_value = sd(value)) %>%
# filter(sd_value > 0.025) %>%
# ungroup()
Let’s retrieve the LV data and summarize how many genes have mutations across samples.
data.with.var<-mp_res%>%
left_join(expr.vars,by='specimenID')
tab<-data.with.var
top.genes=tab%>%#group_by(tumorType)%>%
mutate(numSamps=n_distinct(specimenID))%>%
group_by(Hugo_Symbol)%>%
mutate(numMutated=n_distinct(specimenID))%>%
ungroup()%>%
subset(numMutated>1)%>%
subset(numMutated<(numSamps-1))%>%
select(tumorType,Hugo_Symbol,numSamps,numMutated)%>%distinct()
gene.count=top.genes%>%group_by(tumorType)%>%
mutate(numGenes=n_distinct(Hugo_Symbol))%>%
mutate(minMutated=min(numMutated))%>%
mutate(maxMutated=max(numMutated))%>%
select(tumorType,numGenes,minMutated,maxMutated)%>%distinct()
DT::datatable(gene.count)
## Test significance of each gene/immune population
Now we can loop through every tumor type and gene with a Wilcoxon Rank Sum Test and correct for multiple testing for each LV.
#red.genes<-c("NF1","SUZ12","CDKN2A","EED")##for testing
##first spread the WT/Mutated values
vals<-tab%>%subset(Hugo_Symbol%in%top.genes$Hugo_Symbol)%>%
mutate(mutated=ifelse(is.na(IMPACT),'WT','Mutated'))%>%
select(latent_var,tumorType,value,Hugo_Symbol,specimenID,mutated)%>%
distinct()%>%
spread(key=Hugo_Symbol,value='mutated',fill='WT')
##double check to make sure there are both mutated and unmutated values
counts<-vals%>%
gather(key=gene,value=status,-c(latent_var,tumorType,value,specimenID))%>%
select(latent_var,tumorType,value,gene,specimenID,status)%>%
group_by(latent_var,gene)%>%
mutate(numVals=n_distinct(status))%>%
mutate(numSamps=n_distinct(specimenID))%>%
subset(numVals==2)%>%ungroup()
#so now we have only
with.sig<-counts%>%ungroup()%>%#subset(gene%in%top.genes$Hugo_Symbol)%>%
group_by(latent_var,gene)%>%
mutate(pval=wilcox.test(value~status)$p.value)%>%ungroup()%>%
group_by(latent_var)%>%
mutate(corP=p.adjust(pval))%>%ungroup()%>%
select(latent_var,gene,pval,corP)%>%distinct()
sig.vals<-subset(with.sig,corP<0.01)
DT::datatable(sig.vals%>%group_by(latent_var)%>%summarize(numGenes=n_distinct(gene)))
Interesting! Some genes actually pass p-value correction. What do they look like? Here let’s write the messiest possible code to print.
library(nationalparkcolors)
val<-park_palette('Acadia',2)
names(val)<-c('Mutated','WT')
for(ct in unique(sig.vals$latent_var)){
tplot<-sig.vals[which(sig.vals$latent_var==ct),]
if(nrow(tplot)==0)
next
print(ct)
sigs=tplot%>%rowwise()%>%mutate(vals=paste(gene,format(corP,digits=3),sep=':'))%>%select(vals)%>%unlist()%>%paste(collapse=',')
print(sigs)
p<-counts%>%
subset(latent_var==ct)%>%
subset(gene%in%tplot$gene)%>%
ggplot(aes(x=gene,y=value,col=status))+
geom_boxplot(outlier.shape=NA)+
geom_point(position=position_jitterdodge(),aes(shape=tumorType,col=status,group=status))+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
theme_bw()+
ggtitle(paste(ct,'scores\n',sigs))+
scale_color_manual(values=val)# if(method=='cibersort')
# p<-p+scale_y_log10()
print(p)
}
## [1] "451,REACTOME_MITOCHONDRIAL_PROTEIN_IMPORT"
## [1] "ANP32B:0.000683,CDC27:0.000346,CTBP2:0.0032,CTDSP2:0.000346,FAM104B:0.000346,GGT1:0.000724,IGSF3:0.000346,SLC25A5:0.000346,ZNF717:0.000346"
## [1] "720,PID_FANCONI_PATHWAY"
## [1] "ANP32B:4.21e-05,CDC27:0.000112,CTBP2:6.67e-05,CTDSP2:0.000112,FAM104B:0.000112,GGT1:8.75e-05,IGSF3:0.000112,SLC25A5:0.000112,ZNF717:0.000112"
## [1] "LV 185"
## [1] "ANP32B:2.45e-05,CDC27:2.49e-06,CTBP2:0.000683,CTDSP2:2.49e-06,FAM104B:2.49e-06,GGT1:5.82e-05,IGSF3:2.49e-06,SLC25A5:2.49e-06,ZNF717:2.49e-06"
## [1] "LV 308"
## [1] "ANP32B:0.00178,CDC27:0.000167,CTBP2:0.00034,CTDSP2:0.000167,FAM104B:0.000167,IGSF3:0.000167,SLC25A5:0.000167,ZNF717:0.000167"
## [1] "LV 376"
## [1] "ANP32B:0.00424,CDC27:0.000676,CTBP2:0.00945,CTDSP2:0.000676,FAM104B:0.000676,GGT1:0.00526,IGSF3:0.000676,SLC25A5:0.000676,ZNF717:0.000676"
## [1] "LV 384"
## [1] "ANP32B:3.5e-06,CDC27:2.49e-06,CTBP2:0.00239,CTDSP2:2.49e-06,FAM104B:2.49e-06,GGT1:5.82e-05,IGSF3:2.49e-06,SLC25A5:2.49e-06,ZNF717:2.49e-06"
## [1] "LV 442"
## [1] "ANP32B:0.000487,CDC27:0.000485,CTBP2:0.00131,CTDSP2:0.000485,FAM104B:0.000485,GGT1:0.000528,IGSF3:0.000485,SLC25A5:0.000485,ZNF717:0.000485"
## [1] "LV 445"
## [1] "ANP32B:0.00424,CDC27:0.000485,CTDSP2:0.000485,FAM104B:0.000485,GGT1:0.00178,IGSF3:0.000485,SLC25A5:0.000485,ZNF717:0.000485"
## [1] "LV 546"
## [1] "ANP32B:0.00131,CDC27:0.000485,CTDSP2:0.000485,FAM104B:0.000485,IGSF3:0.000485,SLC25A5:0.000485,ZNF717:0.000485"
## [1] "LV 624"
## [1] "ANP32B:0.000105,CDC27:2.98e-05,CTBP2:0.00239,CTDSP2:2.98e-05,FAM104B:2.98e-05,GGT1:0.000188,IGSF3:2.98e-05,SLC25A5:2.98e-05,ZNF717:2.98e-05"
## [1] "LV 835"
## [1] "ANP32B:0.00034,CDC27:4.72e-05,CTBP2:0.00239,CTDSP2:4.72e-05,FAM104B:4.72e-05,GGT1:0.00027,IGSF3:4.72e-05,SLC25A5:4.72e-05,ZNF717:4.72e-05"
## [1] "LV 849"
## [1] "ANP32B:0.00945,CDC27:0.000112,CTBP2:0.00558,CTDSP2:0.000112,FAM104B:0.000112,GGT1:0.00526,IGSF3:0.000112,SLC25A5:0.000112,ZNF717:0.000112"
## [1] "LV 851"
## [1] "ANP32B:0.000487,CDC27:2.49e-06,CTBP2:3.5e-06,CTDSP2:2.49e-06,FAM104B:2.49e-06,GGT1:1.36e-05,IGSF3:2.49e-06,SLC25A5:2.49e-06,ZNF717:2.49e-06"
## [1] "LV 984"
## [1] "ANP32B:0.000952,CDC27:2.49e-06,CTBP2:0.0032,CTDSP2:2.49e-06,FAM104B:2.49e-06,GGT1:1.36e-05,IGSF3:2.49e-06,SLC25A5:2.49e-06,ZNF717:2.49e-06"
## [1] "31,SVM B cells naive"
## [1] "CDC27:0.000676,CTDSP2:0.000676,FAM104B:0.000676,GGT1:0.000529,IGSF3:0.000676,SLC25A5:0.000676,ZNF717:0.000676"
## [1] "4,REACTOME_NEURONAL_SYSTEM"
## [1] "CDC27:0.000676,CTBP2:0.000953,CTDSP2:0.000676,FAM104B:0.000676,GGT1:0.00235,IGSF3:0.000676,SLC25A5:0.000676,ZNF717:0.000676"
## [1] "LV 229"
## [1] "CDC27:0.000676,CTDSP2:0.000676,FAM104B:0.000676,GGT1:0.000986,IGSF3:0.000676,SLC25A5:0.000676,ZNF717:0.000676"
## [1] "LV 644"
## [1] "CDC27:0.000167,CTBP2:0.000105,CTDSP2:0.000167,FAM104B:0.000167,IGSF3:0.000167,SLC25A5:0.000167,ZNF717:0.000167"
## [1] "LV 653"
## [1] "CDC27:0.00126,CTBP2:0.00729,CTDSP2:0.00126,FAM104B:0.00126,IGSF3:0.00126,SLC25A5:0.00126,ZNF717:0.00126"
## [1] "LV 665"
## [1] "CDC27:0.0017,CTDSP2:0.0017,FAM104B:0.0017,IGSF3:0.0017,SLC25A5:0.0017,ZNF717:0.0017"
## [1] "LV 72"
## [1] "CDC27:0.00301,CTDSP2:0.00301,FAM104B:0.00301,GGT1:0.00679,IGSF3:0.00301,SLC25A5:0.00301,ZNF717:0.00301"
## [1] "LV 864"
## [1] "CDC27:0.00126,CTDSP2:0.00126,FAM104B:0.00126,IGSF3:0.00126,SLC25A5:0.00126,ZNF717:0.00126"
#}
I’m not sure how to interpret this - it seems like most LVs have the same sets of genes that are mutated. Not sure why this is.
#this is a failed attempt to group by tumor type
#with.sig<-counts%>%ungroup()%>%subset(gene%in%top.genes$Hugo_Symbol)%>%
# group_by(latent_var,tumorType,gene)%>%
# mutate(pval=t.test(value~status)$p.value)%>%
# ungroup()%>%
# group_by(latent_var)%>%
# mutate(corP=p.adjust(pval))%>%ungroup()%>%
# select(latent_var,tumorType,gene,pval,corP)%>%distinct()
#sig.vals<-subset(with.sig,corP<0.05)
#DT::datatable(sig.vals)